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The risk of a credit portfolio depends crucially on correlations between the probability of default 
(PD) in different economic sectors. Often, PD correlations have to be estimated from relatively short 
time series of default rates, and the resulting estimation error hinders the detection of a signal. We 
present statistical evidence that PD correlations are well described by a (one-)factorial model. We 
suggest a method of parameter estimation which avoids in a controlled way the underestimation of 
correlation risk. Empirical evidence is presented that, in the framework of the CreditRisk+ model 
with integrated correlations, this method leads to an increased reliability of the economic capital 
estimate. 



Managing portfolio credit risk in a bank requires a 
sound and stable estimation of the loss distribution with 
a special emphasis on the high quantiles denoted as 
Credit Value-at-Risk (Credit VaR). The difference be- 
tween the CreditVaR and the expected loss has to be 
covered by the economic capital, a scarce resource of each 
bank. From a risk management perspective, the defini- 
tion of industry sectors allows to diversify credit risk. 
The degree to which this diversification is successful de- 
pends on the strength of correlations between the sectors. 
Moreover, the correlations between sector PDs crucially 
influence the CreditVaR and hence the economic capital. 

In large banks, the concentration risk in industry sec- 
tors is a key risk driver. Recently, several approaches 
for describing and modelling concentration risk were dis- 
cussed y, |3| . In CreditRisk+ 4| , concentration risk 
is modelled as a multiplicative random effect on the PD 
per counterpart in a given sector. In the original ver- 
sion of CreditRisk+, the loss distribution is calculated 
for independent sector variables. Correlations between 
PD fluctuations in different sectors can be integrated into 
CreditRisk+ with the method of Burgisser et al. . For 
the calculation of the CreditVaR it is important whether 
input parameters like the correlation coefficients between 
sector PDs are known or must be estimated. In the lat- 
ter case, this estimation leads to an additional variability 
of the target estimate, in our case the portfolio loss. In 
this way, uncertainty in the estimation of PD correlations 
translates itself into uncertainty of the economic capital 
of a bank. 

The estimation of cross-correlations is difficult due 
to the " curse of dimensionality" : if the length T of the 
available time series is comparable to the number K of 
industry sectors, the number of estimated correlation 
coefficients is of the same order as the number of input 
parameters with the result of large estimation errors. A 
way out of this dilemma is the use of a factor model with 
a reduced dimensionality of the parameter space. We 
present evidence that the PD correlations for K = 20 



industry sectors are well captured by a one-factor 
model. Surprisingly, even the parameter estimation 
for the one-factor model is subject to large statistical 
fluctuations and gives rise to a considerable uncertainty 
in the CreditVaR. We discuss these fluctuations in detail 
and suggest a bootstrap method which allows to find an 
upper limit for the parameters. We assess the impact 
of different conservative estimates with respect to the 
CreditVaR of a realistic portfolio. 

Description of data set 

As the economic activity and the probability of default 
in a given industry sector is not directly observable, we 
approximate it by the insolvency rate in that sector over 
the last T years. The probability of insolvency PDkt of 
sector k in year t is calculated as the ratio of the num- 
ber of insolvencies in that sector to the total number of 
companies in the sector 
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With the help of insolvency rates, the default prob- 
ability for a given company A can be factorized into an 
individual expected PD pa and the sector specific relative 
PD movement Xk with expectation (Xk) = 1 according 
to 

P(A fails) = p A X k with X kt = - PDld : . (2) 

TEtPDkt 

For this study, we use sector specific default histories as 
supplied by the federal statistical office of Germany. We 
analyze default rates for a segmentation of the economy 
into 20 sectors and estimate the sample covariance matrix 
S emp and sample correlation matrix C omp as 

t=i 

(3) 

with o\ = S° mp . 
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Test for independent sectors 

We first ask whether the sample correlation matrix of 
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the PD time series is compatible with the hypothesis of 
zero correlations. Ideas for testing this hypothesis for 
covariance matrices date back to the seventies 0, and 
were recently generalized to situations where the number 
of time series is larger than the sample size |6|. Here, 
we use an adaption of the tests 0, Q to test for the 
equivalence of correlation matrix to the unit matrix. The 
test statistics 

R = Itr [C 2 ] -1, (4) 

for a correlation matrix C is both K- and 
T-consistent with the T-limiting distribution 

(T-l)KR/2 Xk( K -i)/2 The prefactor T - 1 
rather than T is chosen to improve the finite T properties 
of the test. For our example with T = 7 and K = 20, we 
find R = 5.805, whereas the critical value for a — 0.05 
is i? cr it = 3.719. Hence, the independence of sector PDs 
must not be assumed and a model describing sector 
correlations is needed. 

Description of one factor model 

We diagonalize the empirical cross correlation matrix 
C emp and rank order its eigenvalues A^emp < Aj+j. |emp . 
As we are interested in modelling correlations rather 
than covariances, we normalize the Xn such that they 
have the same, namely the average variance a\ = 
(1/K) Yli=i a Xi an d subtract the mean 

X it = (X lt - . (5) 

We use the components of the eigenvector uWp corre- 
sponding to the largest eigenvalue Aif jem p — 10.38 to 
define a factor time series 

K 

3$ = X>< ( £p*« ■ (6) 

i=l 

As compared to averaging the sector variables without 
prior information, the definition of the factor time series 
from the eigenvector with largest eigenvalue makes sure 
that the factor explains a maximum amount of correla- 
tions. The idea arises from factor analysis, see e.g. y|. In 
the context of stock returns, a time series defined accord- 
ing to the prescription of Eq. HJ1 was found to agree well 
with a value weighted stock index [t| . We expect that the 
factor time series Eq. [5] describes economy wide changes 
of relative PD, possibly weighted by the economic rele- 
vance of the individual sectors. 

We model the correlations between relative PD move- 
ments by a one-factor model 

X it = b i Y t + e it . (7) 

The coefficients {bi} are found by performing a linear 
regression. To see whether a one-factor model fully de- 
scribes the correlations between the {A^t}, we apply the 
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FIG. 1: The components of the eigenvector uimj, of the em- 
pirical correlation matrix (connected full circles) are almost 
identical to the components of the eigenvector u^ nt of the 
point estimator C pom (open circles) . 

test Eq.0|to the correlation matrix of the residuals {en}. 
Taking into account that the regression reduces the effec- 
tive length of the residual time series from T to T — 1, we 
find R = 4.409 slightly below the threshold R clit = 4.463. 
As the assumption of uncorrelated residuals is not re- 
jected, no further factors are needed for the description 
of correlations. 

The point estimator can now be calculated under the 
assumption that the residua {ei,t} are iid observations 
from uncorrelated random variables ej i = 1, . . . , K, i.e. 
(ci€j) = for i ^ j. Defining the factor variance a\ = 

7jt—£ Y^t=i Yt ''■> one nn ds the point estimator for the cross 
correlation matrix as 

qf nt = % + (i- Wi4/<4 • (8) 

The largest eigenvalue of C pomt is found to be 
Appoint = 10.66 in good agreement with the origi- 
nal largest eigenvalue. In addition, the corresponding 
eigenvector u_^j nt is found to be very close to the 
original eigenvector (FigQ. 

Fluctuations in empirical correlation matrices 
a toy model 

In this section, we use the results of Monte Carlo simu- 
lations to study the relation between the cross correlation 
matrix C modcl resulting from infinitely long model time 
series and matrices C slm numerically calculated from fi- 
nite time series of length T. We find that the {C slm } 
differ from C modcl both in a systematic way, for example 
a shift of the largest eigenvalue towards larger values, and 
a random way, i.e. an individual member of the simulated 
ensemble deviates significantly from the average [Tol lll| . 

To simplify simulations, we rewrite Eq. 0as 

X lt = upiFt + arm . (9) 

The random variables are rescaled according to CiTju = eu 
and a(3iF t = biY t such that their variances are vax(F) = 
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FIG. 2: Distribution of (a) the largest eigenvalue and of (b) 
all components of the corresponding eigenvector from simula- 
tions of the one-factor toy model with Ak, model = 10.38. 



o\ and cov(^?7j) = <j\5ij. In addition, the {Pi} obey the 
normalization condition Xh=i $1 = 1' which makes them 
comparable to the components of the eigenvector u^-* . 
The model parameters are subject to the constraint cf — 
1 — a 2 0f in order to enforce var(Xi t ) = u 2 x . 

The model is completely defined by i) the parameter 
a determining the largest eigenvalue, ii) the parameters 
{Pi} and iii) by the probability distribution of the ran- 
dom variables F and {r]i}. As the empirical PD move- 
ments are commonly assumed to follow a gamma dis- 
tribution, we model the random variables F and rji by 
gamma distributions as well |l2j . In addition, we use nor- 
mal distributions for the random variables and find that 
the deviation from a simulation with gamma distributed 
variables is smaller than 3%. As the simulation of gaus- 
sian random variables is computationally much more effi- 
cient than the simulation of gamma distributed variables, 
we use normally distributed variables for the computa- 
tionally demanding selfconsistent calculations described 
in the next section. 

The infinite time series correlation matrix of the model 
is given by 



C model c 
ij — °io 



(1 



(10) 



In this section, we study the outcome of model sim- 
ulations for the particularly simple hypothetical case 



u \ model = Pi = in order to gain qualitative in- 

sight into the occurring fluctuations. For a given value of 
a corresponding to Aj<-. mo dei = 10.38, we perform 10000 
Monte Carlo simulations of Eq. |5J We compute the pdf 
for the largest eigenvalue Aif ;Sml of C slm and the corre- 
sponding eigenvector u^2 . We find that both quantities 
have broad distributions (see Fig. [2J| . The distribution 
of eigenvalues has an average (A#- iS i m ) = 10.72, which is 
significantly larger than the true eigenvalue A^modci = 
10.38. In addition, one finds simulated eigenvalues as low 
as A/f,sim = 5. We quantify the systematic shift of eigen- 
values by the average AA = (\k ,sim) — ^K,modei- 

The 

magnitude of eigenvalue fluctuations is described by the 
standard deviation 



<J\ = \J (A| >im ) - (Ai^sim)' 



(11) 



For the distribution shown in Fig. [2 we find a\ = 2.42. 

There are significant fluctuations of eigenvector com- 
ponents as well. For theoretical eigenvector compo- 
= 0.224 V i one even finds negative em- 



nents u 



odcl 



pirical components indicating spurious anticorrelations, 
which would lead to dangerous hedges in credit portfo- 
lios. Specifically, we calculate the standard deviation 



(12) 



i, model 



do not vary 



and find a Ui — 0.083. Since the 
across i we only need to estimate one a Ui . 

As a conclusion, even if the generating process for 
relative PD movements is a simple one-factor model, the 
empirically found parameters can deviate significantly 
from the theoretical ones. We advocate the point of view 
that the empirical C emp has to be viewed as a member of 
such a fluctuating ensemble in that its eigenvalues and 
eigenvectors can deviate significantly from the unknown 
"true" correlation matrix of PD movements |ToL ITl) . 
Then, the statistical properties of the ensemble {C slm } 
can be used to derive error bars for both the largest 
eigenvalue and the components of the corresponding 
eigenvector. 

Conservative estimates 

How can we use these results to make a reliable estimate 
for the correlation matrix of relative PD movements? A 
bank needs to act in a conservative manner to prevent 
insolvency. Using the empirical correlation matrix, the 
bank risks that the correlations are " accidently" low. The 
most conservative approach would be to assume all cor- 



relations to be 1, i.e. 



,{K) _ 



V i. But now the model 



would effectively be a one-sector model. Any possibility 
to measure concentration risk in certain industry sectors 
would be prevented. The model would not encourage 
diversifying the business across sectors. 

As a controlled mediation we introduce " cases" of add- 
ons of x — 1,2,3 standard deviations to the fluctuating 
quantities such that the predicted risk for a portfolio is 
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FIG. 3: Comparison between the empirical eigenvector u ( 
(diamonds) and the conservative estimates u^' (circles) 



u 2ct (triangles), and u)^ 1 (squares) 



increased. This means correcting the eigenvalue towards 
larger values and the eigenvector components towards 
the value u\ K ^ = 1/y/K indicating the same correlation 
strength for all sectors and the absence of hedge possi- 
bilities. 



Specifically, we let u ■ c 4 = 1/ v if if 
1/VK\ <x-a Ui 



(13) 



and u 



(K) 



±x-a Ui otherwise. The sign is chosen 



such that the overall risk increases, i.e. such that 



falls between the empirical value and 1/yK. After ap- 
plying these corrections, the eigenvector is normalized. 
For this calculation, we fix the parameter a such that 
the simulated largest eigenvalue Xk,s\yii is equal to the 
empirically observed one. We calculate the {a Ui } selfcon- 

sistently, i.e. we calculate a Ui for the u^ase which solves 
Ea 1 131 The results are shown in FigOU We see that for in- 
creasing x = 1, 2, 3, the model eigenvector comes closer to 
the null hypothesis of an eigenvector with identical com- 
ponents. While the empirical eigenvector has significant 
negative components indicating anticorrelations between 
some of the sectors, the negative components in are 

already strongly reduced and completely gone in u^jP . 

Similarl y w e add a fluctuation margin to the model 
eigenvalue |l3( such that 



A 



Ka 



A/f,emp + X ■ CF\ 



(14) 



Here, x specifies the width of the confidence interval for 
the estimation of Aif jmo dei- We perform this calculation 
selfconsistently, i.e. we calculate AA and <j\ for the 
A/f.casc which solves Eq^| We find Xk,i<t = 11.17, 
X K ,2* = 12.93, and X K ,3c = 15.42. 

Economic implications of the different correlation 
matrices 

So far, we have described five different estimates for the 
cross correlation matrix, i.e. C cmp , C point , Ci n CT odo1 , C^ odo1 , 



correlation matrix 


Credit VaR (in billion Euro) 


C 


2.872 


^-point 


2.825 


/"model 


3.172 


/"model 
<"2ct 


3.465 


/"model 

^3<T 


3.665 



TABLE I: 
matrices 



Analysis of Credit VaR for different correlation 



and Cf° ti 



" ;i 1 To judge the economic implications of these 
estimates, we study the differences in the loss distribution 
resulting from these correlation estimations. The Cred- 
it VaR is a key quantity in banking when it comes to risk 
management. Reduced by the expected loss, it quantifies 
the capital needed to prevent insolvency for a given level 
of security. As capital is a resource it must be considered 
in the pricing of credit and trading products. Therefor, 
we quantify the impact of the different correlation esti- 
mates by calculating their influence on Credit VaR. 

The portfolio we study is realistic - although fictitious 
- for an international bank. It consists of 4934 risk units 
distributed asymmetrically over 20 sectors with 20 to 500 
counterparts per sector. The total exposure is 70 bn Euro 
with a largest exposure of 1.5 bn Euro and a smallest ex- 
posure of 0.25 mn Euro. The counterpart specific default 
probability varies between 0.03% and 7%, the expected 
loss for the total portfolio is 373.3 mn Euro. Table I 
shows the CreditVaR calculated by using CreditRisk+ 
and the method of Burgisser et al. for integrating 
correlations. 

We note that the use of a one-factor model changes 
the CreditVaR only by two percent as compared to the 
sample cross correlation matrix. Thus, the assumption 
of a one-factor description and the increase of estimation 
confidence achieved with this assumption yields portfo- 
lio risk estimation compatible with the parameter free 
estimation. 

Our aim is to estimate a quantile of a probability dis- 
tribution - namely the CreditVaR of the portfolio loss 
distribution. In the presence of an unknown parameter, 
it is a well established statistical result (see [l4() that the 
use of the point estimate for the parameter - derived by a 
model or not - leads to an underestimation of the quan- 
tile estimate. To account for this additional estimation 
insecurity, we add a volatility a to the parameter esti- 
mate, i.e. the correlation matrix. When applying a one- 
a estimate, the CreditVaR increases by 400 mn Euro, 
for the two-CT estimate there is another increase by 300 
mn Euro, and using the three-cr estimate the CreditVaR 
increases by yet another 200 mn Euro. To put these num- 
bers in perspective, we note that the CreditVaR without 
including correlations is found to be 2.27 bn Euro, and 
that the assumption of full correlations among all sectors 
leads to a CreditVaR of 3.952 bn Euro. Because nega- 
tive PD correlations are not plausible from an economic 
point of view, the use of the two-cr estimate guarantees a 
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sufficient forecast reliability on the one hand and allows 
for some guidance for economical decision on the other 
hand. 

In summary, we have shown that correlations between 
empirical default rates for economic sectors are statisti- 
cally significant and must be taken into account. We have 
described these correlations with a one-factor model and 
found that this description reproduces well the empirical 
correlations. However, when using the model to generate 
short time series and calculating their correlation matrix, 
one typically observes large statistical fluctuations in the 



correlation structure. Due to these fluctuations, the pa- 
rameter estimation for a one-factor model is plagued by 
large uncertainties. When estimating the model param- 
eters in such a way that the empirically observed ones 
appear as a worst case scenario, the reliability of the 
estimate is increased in a systematic way, leading to a 
moderately increased Credit VaR. 
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